Adjustable Probability Density Grid-Based Clustering for Uncertain Data Streams
نویسندگان
چکیده
Most existing traditional grid-based clustering algorithms for uncertain data streams that used the fixed meshing method have the disadvantage of low clustering accuracy. In view of above deficiencies, this paper proposes a novel algorithm APDG-CUStream, Adjustable Probability Density Grid-based Clustering for Uncertain Data Streams, which adopts the online component and offline component. In online component, the Probability Density Grid Clustering Feature is defined to store the summary information of uncertain data streams, and the time decay factor that introduced into the definition of the probability can reduce the influence of outdated data on clustering results. Init_clustering algorithm is called at special time interval in offline component, it first adjusts sparse probability density grid unit and updates the clustering feature of all probability density grid units. For dense probability density grid, we find and merge all dense or medium neighboring probability density grids connected with this dense probability density grid, and then the Init_clustering results is obtained. Finally APDG-CUStream returns final clustering results. The experimental results show that APDGCUStream algorithm can accurately and rapidly obtain the clustering results with arbitrary shapes and also get better clustering quality.
منابع مشابه
Probability Density Grid-based Online Clustering for Uncertain Data Streams
Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes accordi...
متن کاملDENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window
Evolving data streams are ubiquitous. Various clustering algorithms have been developed to extract useful knowledge from evolving data streams in real time. Density-based clustering method has the ability to handle outliers and discover arbitrary shape clusters whereas grid-based clustering has high speed processing time. Sliding window is a widely used model for data stream mining due to its e...
متن کاملResearch on Clustering Algorithm Based on Grid Density on Uncertain Data Stream
To solve the clustering algorithm based on grid density on uncertain data stream in adjustment cycle for clustering omissions, the paper proposed an algorithm, named GCUDS, to cluster uncertain data steam using grid structure. The concept of the data trend degree was defined to describe the grade of a data point belonging to some grid unit and the defect of information loss around grid units wa...
متن کاملTechnique For Clustering Uncertain Data Based On Probability Distribution Similarity
: Clustering on uncertain data, one of the essential tasks in data mining. The traditional algorithms like K-Means clustering, UK Means clustering, density based clustering etc, to cluster uncertain data are limited to using geometric distance based similarity measures and cannot capture the difference between uncertain data with their distributions. Such methods cannot handle uncertain objects...
متن کاملDensity-Based Clustering Based on Probability Distribution for Uncertain Data
Today we have seen so much digital uncertain data produced. Handling of this uncertain data is very difficult. Commonly, the distance between these uncertain object descriptions are expressed by one numerical distance value. Clustering on uncertain data is one of the essential and challenging tasks in mining uncertain data. The previous methods extend partitioning clustering methods like k-mean...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011